Lab 4

Colors Assigned 2/6/19, Due 2/11/19 Overview The purpose of this lab is to use color to your advantage. You will be asked to use a variety of color palettes, and use color for its three main purposes: (a) distinguish groups from each other, (b) represent data values, and (c) highlight particular data points.

Data

We’ll be working with the honey production data from #tidytuesday. The repo contains the full data, but we’ll work with just the cleaned up version, using the honeyproduction.csv file, which is posted on canvas or can be obtained by downloading the zip file from the repo.

Assignment

  1. Visualize the total production of honey across years by state. Use color to highlight the west coast (Washington, Oregon, and California).
  2. Reproduce the plot according three different kinds of color blindness, as well as a desaturated version.
  3. Reproduce the plot using a color blind safe pallette.
  4. Download the file here denoting the region and division of each state.
    • Join the file with your honey file.
    • Produce a bar plot displaying the average honey for each state across years.
    • Use color to highlight the region of the country the state is from.
    • Note patterns you notice.
  5. Create a heatmap displaying the average honey production across years by region.
  6. Create at least one more plot of your choosing using color to distinguish, represent data values, or highlight. If you are interested in producing maps, I would recommend joining the data with the output from ggplot2::map_data(“state”). But be careful with keys and that you don’t end up with a many-to-many join. See here for additional help.

Finishing up

When you have finished the above, upload your rendered (knit) HTML file to canvas.

Promt 1: Visualize the total production of honey across years by state. Use color to highlight the west coast (Washington, Oregon, and California).

## # A tibble: 6 x 8
##   state numcol yieldpercol totalprod   stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>    <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000   159000       0.72    818000  1998
## 2 AZ     55000          60   3300000  1485000       0.64   2112000  1998
## 3 AR     53000          65   3445000  1688000       0.59   2033000  1998
## 4 CA    450000          83  37350000 12326000       0.62  23157000  1998
## 5 CO     27000          72   1944000  1594000       0.7    1361000  1998
## 6 FL    230000          98  22540000  4508000       0.64  14426000  1998

2. Reproduce the plot according three different kinds of color blindness, as well as a desaturated version.

3. Reproduce the plot using a color blind safe pallette.

4. Download the file here denoting the region and division of each state.

- Join the file with your honey file.
- Produce a bar plot displaying the average honey for each state across years.
- Use color to highlight the region of the country the state is from.
- Note patterns you notice.
## # A tibble: 6 x 8
##   state numcol yieldpercol totalprod   stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>    <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000   159000       0.72    818000  1998
## 2 AZ     55000          60   3300000  1485000       0.64   2112000  1998
## 3 AR     53000          65   3445000  1688000       0.59   2033000  1998
## 4 CA    450000          83  37350000 12326000       0.62  23157000  1998
## 5 CO     27000          72   1944000  1594000       0.7    1361000  1998
## 6 FL    230000          98  22540000  4508000       0.64  14426000  1998
## # A tibble: 6 x 4
##   state_name state region division          
##   <chr>      <chr> <chr>  <chr>             
## 1 Alaska     AK    West   Pacific           
## 2 Alabama    AL    South  East South Central
## 3 Arkansas   AR    South  West South Central
## 4 Arizona    AZ    West   Mountain          
## 5 California CA    West   Pacific           
## 6 Colorado   CO    West   Mountain
## # A tibble: 6 x 11
##   state numcol yieldpercol totalprod stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>  <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000 1.59e5       0.72    818000  1998
## 2 AZ     55000          60   3300000 1.48e6       0.64   2112000  1998
## 3 AR     53000          65   3445000 1.69e6       0.59   2033000  1998
## 4 CA    450000          83  37350000 1.23e7       0.62  23157000  1998
## 5 CO     27000          72   1944000 1.59e6       0.7    1361000  1998
## 6 FL    230000          98  22540000 4.51e6       0.64  14426000  1998
## # … with 3 more variables: state_name <chr>, region <chr>, division <chr>
## # A tibble: 6 x 3
## # Groups:   state [6]
##   state region mean_honey
##   <chr> <chr>       <dbl>
## 1 AL    South     825467.
## 2 AR    South    2810400 
## 3 AZ    West     2032267.
## 4 CA    West    23169000 
## 5 CO    West     1750600 
## 6 FL    South   16469867.

## The Dakotas are the states where the majority of honey is produced, and the midwest region is the region with the most honey produced. I question the classification of Montana as a West region state and would argue it belongs in the Midwest, which would further burgeon the production of Midwest honey as a whole.

5. Create a heatmap displaying the average honey production across years by region.

6. Create at least one more plot of your choosing using color to distinguish, represent data values, or highlight. If you are interested in producing maps, I would recommend joining the data with the output from ggplot2::map_data(“state”). But be careful with keys and that you don’t end up with a many-to-many join. See here for additional help.

## # A tibble: 6 x 11
##   state numcol yieldpercol totalprod stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>  <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000 1.59e5       0.72    818000  1998
## 2 AZ     55000          60   3300000 1.48e6       0.64   2112000  1998
## 3 AR     53000          65   3445000 1.69e6       0.59   2033000  1998
## 4 CA    450000          83  37350000 1.23e7       0.62  23157000  1998
## 5 CO     27000          72   1944000 1.59e6       0.7    1361000  1998
## 6 FL    230000          98  22540000 4.51e6       0.64  14426000  1998
## # … with 3 more variables: state_name <chr>, region <chr>, division <chr>

##        long      lat group order state_name subregion
## 1 -87.46201 30.38968     1     1    Alabama      <NA>
## 2 -87.48493 30.37249     1     2    Alabama      <NA>
## 3 -87.52503 30.37249     1     3    Alabama      <NA>
## 4 -87.53076 30.33239     1     4    Alabama      <NA>
## 5 -87.57087 30.32665     1     5    Alabama      <NA>
## 6 -87.58806 30.32665     1     6    Alabama      <NA>

## # A tibble: 6 x 11
##   state numcol yieldpercol totalprod stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>  <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000 1.59e5       0.72    818000  1998
## 2 AZ     55000          60   3300000 1.48e6       0.64   2112000  1998
## 3 AR     53000          65   3445000 1.69e6       0.59   2033000  1998
## 4 CA    450000          83  37350000 1.23e7       0.62  23157000  1998
## 5 CO     27000          72   1944000 1.59e6       0.7    1361000  1998
## 6 FL    230000          98  22540000 4.51e6       0.64  14426000  1998
## # … with 3 more variables: state_name <chr>, region <chr>, division <chr>
## # A tibble: 6 x 16
##   state numcol yieldpercol totalprod stocks priceperlb prodvalue  year
##   <chr>  <dbl>       <int>     <dbl>  <dbl>      <dbl>     <dbl> <int>
## 1 AL     16000          71   1136000 159000       0.72    818000  1998
## 2 AL     16000          71   1136000 159000       0.72    818000  1998
## 3 AL     16000          71   1136000 159000       0.72    818000  1998
## 4 AL     16000          71   1136000 159000       0.72    818000  1998
## 5 AL     16000          71   1136000 159000       0.72    818000  1998
## 6 AL     16000          71   1136000 159000       0.72    818000  1998
## # … with 8 more variables: state_name <chr>, region <chr>, division <chr>,
## #   long <dbl>, lat <dbl>, group <dbl>, order <int>, subregion <chr>